Identification of Biomarkers for Lung Adenocarcinoma With Qi Deficiency and Phlegm Dampness

ABSTRACT Background Qi deficiency and phlegm dampness (QPD) is one of the most common traditional Chinese medicine (TCM) syndromes in lung adenocarcinoma (LUAD). This study aimed to identify syndrome‐specific biomarkers for LUAD with QPD syndrome. Methods Peripheral blood mononuclear cells (PBMCs) from LUAD patients with QPD, LUAD patients with non‐QPD (N‐QPD), and healthy control (H) were collected and analyzed with RNA‐seq to identify differentially expressed genes (DEGs). The area under the receiver operator characteristic curve (AUC) of each DEG was calculated, and the top 10 highest AUC DEGs were validated by qRT‐PCR. Logistic regression analysis was used to develop a diagnostic model evaluated with AUC. Results A total of 135 individuals were enrolled in this study (training set: 15 QPD, 15 N‐QPD, 15 H; validation set: 30 QPD, 30 N‐QPD, 30 H). A total of 1480 DEGs were identified between QPD and N‐QPD. The qRT‐PCR results showed that the expression of DDR2 was downregulated, and PPARG was upregulated, which was in line with the finding of the training set. We developed a diagnostic model with these two genes. The AUC of the diagnostic model in the training cohort and validation cohort was 0.891 and 0.777, respectively. Conclusions We identified the two genes (DDR2 and PPARG) as syndrome‐specific biomarkers for LUAD with QPD syndrome and developed a novel diagnostic model, which may help to improve the accuracy and sensibility of clinical diagnosis and provide a new target for natural drug treatment of LUAD.


| Introduction
Lung cancer (LC) remains the most common cancer and the leading cause of cancer-related deaths globally [1].It was estimated that more than 234 000 new cases were diagnosed as LC, and over 170 000 patients die of LC annually [2,3].Lung adenocarcinoma (LUAD) is the primary pathological subtype of non-small cell LC (NSCLC), accounting for 50% of NSCLC.The standard therapies for LUAD are surgical resection, chemotherapy, and radiotherapy, and they have significantly improved the clinical outcomes of LUAD [4].However, when LUAD patients are treated with chemotherapy and radiotherapy, considerable side effects often occur, such as nausea and vomiting, decreases in white blood cells, and gastrointestinal reactions [5].
Traditional Chinese medicine (TCM) has been used to treat and prevent cancer for over 2000 years.It has shown promising efficacy in cancer adjuvant therapy by improving the quality of life, reducing the side effects of radiotherapy and chemotherapy, and prolonging survival time [6,7].Syndrome differentiation is the core theory of TCM, indicating that complementary treatment should be adopted according to different types of TCM syndromes [8].However, it is challenging to discriminate TCM symptoms only by clinical symptoms, and the lack of laboratory diagnosis and specific clinical biomarkers increased the difficulty.Qi deficiency and phlegm-turbid stagnation are one of the most common and typical TCM syndromes in LC (15.8-21.9%)[9].Qi deficiency and phlegm dampness (QPD) constitution are mostly caused by the weakness of the spleen and stomach [10].The main symptoms of Qi deficiency are physical weakness, dizziness, fatigue, and pulse deficiency.The main symptoms of phlegm dampness are easy sweating, greasy skin, phlegm, and chest tightness.We formulated systematically the theoretical point of view that treats malignant tumors from eliminating phlegm to detoxification and have achieved survival benefits in LC treatment under theoretical guidance [11,12].Moreover, a previous study has reported that therapeutic methods of eliminating phlegm and others play compound functions in cancer patients and formulate an exemplary method for alleviating the side effects of both radiotherapy and chemotherapy [13].Meanwhile, the prescription of eliminating phlegm has been identified as inhibiting the growth and metastasis of Lewis lung carcinoma in mice remarkably [14].TCM syndromes rely on the subjective judgment of experienced TCM practitioners and lack an accurate diagnostic strategy.The combination of TCM and modern molecular medicine will contribute to the accurate diagnosis of TCM symptoms.
Recently, omics technology has provided a novel way to systematically understand the potential mechanism of cancers at the molecular level and could be utilized for identifying novel biomarkers for cancer treatment and diagnosis.In the present study, we collected peripheral blood mononuclear cells (PBMCs) from LUAD patients with QPD, LUAD patients with non-QPD (N-QPD), and healthy individuals (H).We analyze them with RNA-seq to identify syndrome-specific biomarkers for LUAD with QPD.Then, qRT-PCR was used for validating syndromespecific biomarkers, and logistics regression analysis was used to develop a diagnostic model, which may help provide evidencebased medical support for developing TCM theory.A detailed flow chart of this study is shown in Figure 1.

| Criteria for LUAD and TCM Syndrome
Between July 2016 and June 2018, a total of 45 subjects (training cohort: 15 QPD, 15 NQPD, 15 H; validation cohort: 30 QPD, 30 NQPD, 30 H) were collected from three independent research centers (Tongde Hospital, Cancer Hospital of University of Chinese Academy of Sciences, and Taizhou Hospital).
The TCM syndrome was judged with characteristics according to the Chinese Medical Constitution 2008, Issued by the Chinese Association of TCM and TCM Clinical Diagnosis and Treatment Terminology.Meanwhile, the results were confirmed by three Professors of TCM.This study was approved with informed consent by the ethics committee of Tongde Hospital, Taizhou Hospital, and Cancer Hospital.

| Sample Collection and PBMC Isolation
After overnight fasting, 10 mL of peripheral blood was collected in an EDTA tube and stored at −80°C.Before analysis, the peripheral blood samples were thawed to room temperature.An aliquot of 300 μL plasma samples was placed in 1.5 mL Eppendorf tubes and added 900 μL FACS Lysing Solution.After 10 min, each sample was centrifuged at 3000 rpm for 15 min.Then, the supernatant was removed, and the residue was resuspended in a solution (90% FBS and 10% DMSO) and stored in liquid nitrogen.

| RNA-seq Analysis
Qubit 2.0 (Life Technologies, USA) and Bioanalyzer 2100 (Agilent, Germany) were used to analyze the quality and integrity of RNA.For library preparation, 3 μg of total RNA were captured by NEBNext Oligos (T) 25 beads (NEB, USA), sheared to yield fragments of ~250 bp, and reverse-transcribed with NEBNext RNA first and second Strand Synthesis Module (NEB, USA).The products were end-repaired, A-tailed, ligated to Illumina sequencing adapters, and amplified by PCR.The sequencing library was qualified by Qubit 2.0 (Life Technologies, USA) and Bioanalyzer 2100 (Agilent, Germany), then sequenced on Illumina Hiseq X-Ten with 2 × 150 bp paired-end reads, which were controlled by Hiseq Control Software (HCS).

| Functional Enrichment Analysis
For assessing the biological functions of DEGs in TCM syndrome, we performed functional enrichment analysis with GeneAnalytics.Score > 10 and gene count > 5 were the cut-off criteria.

| qRT-PCR
According to the manufacturer's instructions, the TRIzol reagent (Invitrogen) was used to isolate total RNA.The Prime Script RT reagent kit (TaKaRa, Dalian, China) reverse-transcribed   RNA to complementary DNA.Then, the SYBR Green master mix (TaKaRa) was used to perform a PCR with glyceraldehyde 3-phosphate dehydrogenase and U6 as the internal control.The level of mRNAs was analyzed by the comparative 2 −ΔΔCt method.

| Statistical Analysis
Principal components analysis (PCA), linear discriminant analysis (LDA), and kernel density estimation (KDE) were performed with MATLAB (R2015b).The area under the receiver operating characteristic curve (AUC) was determined with package "pROC" in R4.0.2 (http:// r-proje ct.org).Logistic regression analysis was used to develop a diagnostic model with the package (glmt).

| Sample Size Estimation
The sample size was calculated by Medcalc software.With a sample size ratio 1:1 in the negative and positive groups and a power of 0.9, a minimum of 22 (total sample size) was required to achieve the expected performance (AUC = 0.85).Considering the dropout rate of about 10%, the sample size of the cohort was finally set as follows: the training group (n = 45) with 15 H, 15 QPD, and 15 N-QPD; the validation group (n = 90) with 30 H, 30 QPD, and 30 N-QPD.

| Clinical Characterization of the Study Population
The clinical and pathological features of all subjects are presented in Table 1

| Identification of DEGs
After aligning reads to the reference genome and discarding low-quality reads, the median value of clean reads was 9.09 million, with 89.94% of clean reads mapped to the human genome.PCA demonstrated a clear boundary between LUAD samples and normal samples, and three groups (QPD, N-QPD, and H group) gathered in the different areas (Figure 2A).Hierarchical clustering was performed to divide the QPD group into Clusters 1 and 2 (Figure 2B).The LDA and KDE showed a separation trend between Clusters 1 and 2, and a corresponding peak was present (Figure 2C).We tried to find the reason for the differentiation of the two clusters by analyzing the correlation of clinical information (gender, age, tumor type, and TNM stage) between the two clusters.However, it is clear that there is no significant difference in clinical factors between the two clusters.We identified DEGs between different groups using "DESeq2."The |logFC| > 2 and FDR < 0.05 were set as the cut-off values.A total of 1480 DEGs were identified between QPD and N-QPD groups (Figure 2D).

| Functional Enrichment Analysis
For a better understanding of the function of DEGs, gene enrichment analysis was performed with Gene Analytics.As shown in Figure 3, biological processes were enriched in the QPD group, including CTLA4 signaling, GPCR pathway, ICos-ICosL pathway in T-helper cell, T cell receptor signaling pathway, TCR signaling, and T cell receptor signaling pathway.In addition, in Cluster 1, six signaling pathways were enriched, and seven were involved in Cluster 2.

| Identification of Biomarkers for QPD Syndrome
Then, the AUC of each DEG was calculated to evaluate the diagnostic effectiveness, and 10 DEGs with the highest AUC would be picked as potential biomarkers for QPD syndrome.Moreover, the DEGs highly related to selected biomarkers (p < 0.05) would be discarded to ensure that the biomarkers can provide as much complementary information as possible.The top 10 DEGs with AUC (ANK2, B3GNT7, DDR2, EIF5, GPER1, METTL7B, MKRN3, MRAS, PPARG (peroxisome proliferatoractivated receptor gamma), and TMEM41A) were selected as the potential biomarkers in the QPD group and presented Figure 4.

| Validation of Biomarkers for QPD Syndrome by qRT-PCR
For verifying 10 biomarker expression levels in QPD syndrome, we carried out qRT-PCR in the validation set.There was a significant difference in the expression of five genes (EIF5, TMEM41A, MRAS, DDR2, and PPARG) between the QPD and the N-QPD groups.In addition, the results showed that the expression of three genes (EIF5, TMEM41A, and MRAS) was upregulated in the QPD group, which was inconsistent with the results of RNA-seq (Figure 5).Besides, the expression of DDR2 was downregulated, and PPARG was upregulated, which was in line with the finding of the training set (Figure 6A-D).The AUC of DDR2 and PPARG in the validation set is shown in Figure 6E,F.

| Development and Validation of a Diagnostic Model for QPD Syndrome
Finally, we applied DDR2 and PPARG to develop a diagnostic model using logistic regression analysis.The diagnostic formula was as follows.The AUC of the diagnostic model in the training set and validation set was 0.891 and 0.777, respectively (Figure 7).

| Discussion
Comprehensive treatment or multimodal treatment has been a research hotspot in the field of cancer in recent years.In China, TCM, as a comprehensive therapy, is a common choice to improve patients' quality of life.Integrated traditional Chinese and Western medicine treatment has been reported to effectively improve some symptoms of patients with non-small cell LC [15].Syndrome differentiation is the core concept of TCM therapy, which is used to recognize and analyze disease syndromes.The syndrome type of patients with stage I-III LC is mainly Qi deficiency of the lung spleen, while the syndrome type of patients with stage IV LC is mainly deficiency of both Qi and Yin.The syndrome is characterized by phlegm-heat addiction and toxic internal obstruction, liver-fire, and liver-yang excess.At the same time, the deficiency is mainly caused by kidney and brain deficiency, liver-yin deficiency, and spleen and stomach dysfunction.It is mostly caused by phlegm turbidity, blood addiction, and toxic pathogens blocking the brain collaterals and is closely related to the liver, spleen, and kidney.The clinical treatment of QPD focuses on clearing the orifices and dissipating phlegm, strengthening the body and cultivating the essence, clearing and reducing turbidity, promoting Qi, and promoting blood circulation.However, due to a lack of laboratory diagnosis and typical clinical biomarkers, the discrimination of TCM symptoms only relied on clinical symptoms.Transcriptomics is a way that applies chip technology to explore relevant genetic expression profiles, screen credible biomarkers, and provide new ideas to help explain the essence of TCM syndromes in a microscopic view [16].
In this study, we collected PBMC from LUAD with QPD syndrome and depicted gene expression profiles with RNA-seq in the training set.We identified DEGs between groups using "DESeq2" and 1480 DEGs between QPD and N-QPD groups.Functional enrichment analysis demonstrated that DEGs were mainly enriched in immune-related signaling pathways.We selected top 10 highest AUC genes (ANK2, B3GNT7, DDR2, EIF5, GPER1, METTL7B, MKRN3, MRAS, PPARG, and TMEM41A) as potential biomarkers for further analysis.We validated the results in the validation set.The results showed that in QPD syndrome patients, DDR2 (discoidin domain receptor tyrosine kinase 2) was downregulated and PPARG was upregulated in both sets, which suggests that DDR2 and PPARG may be the potential biomarkers for QPD syndrome and could be used for clinical diagnosis of QPD syndrome.DDR2 encodes a member of the discoidin domain receptor subclass of the receptor tyrosine kinase protein family, which plays an important role in the communication of cells with their microenvironment [17,18].The activation of DDR2 was highly related to many cellular phenotypes, such as proliferation, migration, transformation, and differentiation.A previous study reported that the DDR2 mutation could influence the progression of LC by reducing the growth-inhibitory effect of collagen [19,20].PPARG encodes protein, PPAR-gamma, a member of the peroxisome proliferator-activated receptor subfamily of nuclear receptors, and participates in the pathology of numerous diseases like obesity, diabetes, atherosclerosis, and cancer [21,22].It has been reported that upregulated expression of PPARG was positively associated with a better prognosis of LUAD and could drive multiple molecular triggers against the pathologic development and prognosis of LUAD [23,24].
Molecular biology reveals the nature of biology phenomena at the molecular level, and studying the material basis of life is its main task [25].Molecular biology has opened up a new way to understand further the inherent basic laws of life phenomena [26,27].Although the theories of TCM and molecular biology are the products of different thought systems and times, the material basis of life activities discussed by the two sciences is the same.Therefore, it is of great significance to introduce molecular biology into the study of TCM and study TCM from the molecular and genetic levels to promote the combination of TCM and modern medicine and realize the modernization of TCM.
In this study, we developed a diagnostic model with DDR2 and PPARG using logistic regression analysis.The AUC of the diagnostic model in the training and validation cohort was 0.891 and 0.777, respectively, suggesting that this diagnostic model was suitable for the diagnosis of QPD syndrome in LUAD.We identified DDR2 and PPARG as potential biomarkers for LUAD with QPD syndrome and developed a practical diagnostic model.The result enriched the diagnostic methods of TCM syndrome differentiation, which may help to improve the accuracy and sensibility of clinical diagnosis and provide a new target for natural drug treatment of LUAD.Future studies should explore the clinical translational potential of this method in a large multicenter cohort, with a view to achieving accurate TCM diagnosis and treatment of LC earlier.

FIGURE 2 |
FIGURE 2 | Identification of differentially expressed genes (DEGs).(A) PCA among three groups in the training cohort.(B) hierarchical clustering divided the QPD group into Cluster 1 and Cluster 2. (C) The LDA and the KDE showed multiple peaks between Cluster 1 and Cluster 2. (D) DEGs between different groups.H: healthy individuals; N-QPD: non-Qi deficiency and phlegm dampness; QPD: Qi deficiency and phlegm dampness lung adenocarcinoma patients.

FIGURE 6 |
FIGURE 6 | Validation of biomarkers for QPD syndrome by qRT-PCR.(A) The expression of DDR2 in the training cohort and (C) in the validation cohort.(B) The expression of PPARG in the training cohort and (D) in the validation cohort.(E) The AUC of DDR2 for QPD syndrome in the validation cohort.(F) The AUC of PPARG for QPD syndrome in the validation cohort.

FIGURE 7 |
FIGURE 7 | The AUCs of the diagnostic model in the training cohort and validation cohort.

TABLE 1 |
The baseline chart of patients in this study.